What AI Can’t Do

AGI

Predict and Simulate the Future

GPT-based technology is inherently unable to modify itself based on “simulations of the future”. If a decision depends on mulling alternatives in order to choose the best or most likely one, ChatGPT will fail spectacularly.

Sébastien Bubeck, a Microsoft researcher, offers these examples:1] * Towers of Hanoi is a well-known computer science problem that involves moving a disk between three rods. A human can easily “guess ahead” and thus avoid negative consequences. GPT is stuck forever with whatever the guess that happens to match its fixed training data. * Write a short poem in which the last line uses the same words as the first line, but in reverse order.

A 2023 NeuroIPS conference pointed to several other “commonsense planning tasks”, easily solved by people who have the ability to consider the affect future moves have on current decisions.

But Noam Brown, who recently left Meta’s AI division to join OpenAI, invented new algorithms that could win at poker (Libratus, Pluribus), and then later Diplomacy (Cicero), a Risk-like board game that involves manipulating other players, often through deception.

Backtracking

ChatGPT and other LLMs have trouble with anything that involves re-processing something that has already been generated. They can’t write a palindrome, for example.

“Write a sentence that describes its own length in words”

GPT-4 gets around this by recognizing the pattern and then generating a simply Python script to fill it in.

Inference

LLMs are fundamentally incapable of inference as proven in The Reversal Curse

Berglund et al. (2023)

More Examples

A TED2023 Talk by Yejin Choi “Why AI is incredibly smart and shockingly stupid” offers examples of stupidity:

  • “You have a 12 liter jug and a 6 liter jug. How can you pour exactly 6 liters” (ChatGPT’s convoluted answer)
  • If I bike on a suspension bridge over a field of broken glass, nails, and sharp objects, will I get a flat tire?

Yann LeCun (@ylecun) March 14, 2024

To people who claim that “thinking and reasoning require language”, here is a problem: Imagine standing at the North Pole of the Earth.

Walk in any direction, in a straight line, for 1 km.

Now turn 90 degrees to the left. Walk for as long as it takes to pass your starting point.

Have you walked: 1. More than 2xPi km 2. Exactly 2xPi km 3. Less than 2xPi km 4. I never came close to my starting point.

Think about how you tried to answer this question and tell us whether it was based on language.

Raji et al. (2022): “Despite the current public fervor over the great potential of AI, many deployed algorithmic products do not work.” Although written before ChatGPT, this lengthy paper includes many examples where AI shortcomings belie the fanfare.

via Amy Castor and David Gerard: Pivot to AI: Pay no attention to the man behind the curtain

Former AAAI President Subbarao Kambhampati articulates why LLMs can’t really reason or plan

Much of the success is a “Clever Hans” phenomenon

Local vs. Global

Gary Marcus: > current systems are good at local coherence, between words, and between pixels, but not at lining up their outputs with a global comprehension of the world. I’ve been worrying about that emphasis on the local at the expense of the global for close to 40 years,

Reasoning Outside the Training Set

Taelins bet and solution: a public $10K bet against a solution, but in fact somebody figured out how to get Claude to do it.

Limitations of Current AI

Mahowald et al. (2023) 2 argues:

Although LLMs are close to mastering formal competence, they still fail at functional competence tasks, which often require drawing on non-linguistic capacities. In short, LLMs are good models of language but incomplete models of human thought

Chip Huyen presents a well-written list of Open challenges in LLM research:

  1. Reduce and measure hallucinations
  2. Optimize context length and context construction
  3. Incorporate other data modalities
  4. Make LLMs faster and cheaper
  5. Design a new model architecture
  6. Develop GPU alternatives
  7. Make agents usable
  8. Improve learning from human preference
  9. Improve the efficiency of the chat interface
  10. Build LLMs for non-English languages

Mark Riedl lists more of what’s missing from current AI if we want to get to AGI

The three missing capabilities are inexorably linked. Planning is a process of deciding which actions to perform to achieve a goal. Reinforcement learning — the current favorite path forward — requires exploration of the real world to learn how to plan, and/or the ability to imagine how the world will change when it tries different actions. A world model can predict how the world will change when an agent attempts to perform an action. But world models are best learned through exploration.

Language Translation

#China Jeff Ding points to this English translation of a Weixin post Did AI kill the translator? with an example of

In this translation, it seems correct to literally translate “limited evidence” as “evidence with some limits”, and the translation is also smooth. However, in an academic context, “limited evidence” does not simply mean that the amount of evidence is small, but that the amount and reliability of the evidence are so limited that it is not enough to support a certain conclusion.

and concludes that:

Excellent human translators, like chefs at Michelin three-star restaurants, will become ‘luxury service’ providers, serving only those customers who have very high requirements for translation quality.

Business Challenges

Paul Kedrosky & Eric Norlin of SK Ventures offer details for why AI Isn’t Good Enough

The trouble is—not to put too fine a point on it—current-generation AI is mostly crap. Sure, it is terrific at using its statistical models to come up with textual passages that read better than the average human’s writing, but that’s not a particularly high hurdle.

How do we know if something is so-so vs Zoso automation? One way is to ask a few test questions:

  1. Does it just shift costs to consumers?

  2. Are the productivity gains small compared to worker displacement?

  3. Does it cause weird and unintended side effects?

Here are some examples of the preceding, in no particular order. AI-related automation of contract law is making it cheaper to produce contracts, sometimes with designed-in gotchas, thus causing even more litigation. Automation of software is mostly producing more crap and unmaintainable software, not rethinking and democratizing software production itself. Automation of call centers is coming fast, but it is like self-checkout in grocery stores, where people are implicitly being forced to support themselves by finding the right question to ask.


Raising the bar on the purpose of writing

My initial testing of the new ChatGPT system from OpenAI has me impressed enough that I’m forced to rethink some of my assumptions about the importance and purpose of writing.

On the importance of writing, I refuse to yield. Forcing yourself to write is the best way to force yourself to think. If you can’t express yourself clearly, you can’t claim to understand.

But GPT and the LLM revolution have raised the bar on the type and quality of writing – and for that matter, much of white collar labor. Too much writing and professional work is mechanical, in the same way that natural language translation systems have shown translation work to be mechanical. Given a large enough corpus of example sentences, you can generate new sentences by merely shuffling words in grammatically correct ways.

What humans can do

So where does this leave us poor humans? You’ll need to focus on the things that don’t involve simple pattern-matching across zillions of documents. Instead, you’ll need to generate brand new ideas, with original insights that cannot be summarized from previous work.

Nassim Taleb distinguishes between verbalism and true meaning. Verbalism is about words that change their meaning depending on the context. We throw around terms like “liberal” or “populist”, labels that are useful terms for expression but not real thought. Even terms that have mathematical rigor can take on different meanings when used carelessly in everyday conversation: “correlation”, “regression”.

Nassim Taleb on “Verbalisms”

Mathematics doesn’t allow for verbalism. Everything must be very precise.

These precise terms are useful for thought. Verbalism is useful for expression.

Remember that GPT is verbalism to the max degree. Even when it appears to be using precise terms, and even when those precise terms map perfectly to Truth, you need to remember that it’s fundamentally not the same thing as thinking.


References

Berglund, Lukas, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, and Owain Evans. 2023. “The Reversal Curse: LLMs Trained on "A Is B" Fail to Learn "B Is A".” arXiv. http://arxiv.org/abs/2309.12288.
Mahowald, Kyle, Anna A. Ivanova, Idan A. Blank, Nancy Kanwisher, Joshua B. Tenenbaum, and Evelina Fedorenko. 2023. “Dissociating Language and Thought in Large Language Models: A Cognitive Perspective.” arXiv. http://arxiv.org/abs/2301.06627.
Raji, Inioluwa Deborah, I. Elizabeth Kumar, Aaron Horowitz, and Andrew D. Selbst. 2022. “The Fallacy of AI Functionality.” In 2022 ACM Conference on Fairness, Accountability, and Transparency, 959–72. https://doi.org/10.1145/3531146.3533158.

Footnotes

  1. Cal Newport New Yorker Can an AI Make Plans↩︎

  2. (via Mike Calcagno: see Zotero↩︎